library(tidyverse)
library(readxl)
candy_2015 <- read_excel("../raw_data/boing-boing-candy-2015.xlsx")
candy_2016 <- read_excel("../raw_data/boing-boing-candy-2016.xlsx")
candy_2017 <- read_excel("../raw_data/boing-boing-candy-2017.xlsx")
New names:
* `` -> ...114
here::here()
[1] "C:/Users/mahri/OneDrive/CodeClan/dirty_data_project/dirty_data_codeclan_project_mahri/dirty_data_task_4_mahri"
From glimpse
head(candy_2015)
glimpse(candy_2016)
Rows: 1,259
Columns: 123
$ Timestamp <dttm> ~
$ `Are you going actually going trick or treating yourself?` <chr> ~
$ `Your gender:` <chr> ~
$ `How old are you?` <chr> ~
$ `Which country do you live in?` <chr> ~
$ `Which state, province, county do you live in?` <chr> ~
$ `[100 Grand Bar]` <chr> ~
$ `[Anonymous brown globs that come in black and orange wrappers]` <chr> ~
$ `[Any full-sized candy bar]` <chr> ~
$ `[Black Jacks]` <chr> ~
$ `[Bonkers (the candy)]` <chr> ~
$ `[Bonkers (the board game)]` <chr> ~
$ `[Bottle Caps]` <chr> ~
$ `[Box'o'Raisins]` <chr> ~
$ `[Broken glow stick]` <chr> ~
$ `[Butterfinger]` <chr> ~
$ `[Cadbury Creme Eggs]` <chr> ~
$ `[Candy Corn]` <chr> ~
$ `[Candy that is clearly just the stuff given out for free at restaurants]` <chr> ~
$ `[Caramellos]` <chr> ~
$ `[Cash, or other forms of legal tender]` <chr> ~
$ `[Chardonnay]` <chr> ~
$ `[Chick-o-Sticks (we don’t know what that is)]` <chr> ~
$ `[Chiclets]` <chr> ~
$ `[Coffee Crisp]` <chr> ~
$ `[Creepy Religious comics/Chick Tracts]` <chr> ~
$ `[Dental paraphenalia]` <chr> ~
$ `[Dots]` <chr> ~
$ `[Dove Bars]` <chr> ~
$ `[Fuzzy Peaches]` <chr> ~
$ `[Generic Brand Acetaminophen]` <chr> ~
$ `[Glow sticks]` <chr> ~
$ `[Goo Goo Clusters]` <chr> ~
$ `[Good N' Plenty]` <chr> ~
$ `[Gum from baseball cards]` <chr> ~
$ `[Gummy Bears straight up]` <chr> ~
$ `[Hard Candy]` <chr> ~
$ `[Healthy Fruit]` <chr> ~
$ `[Heath Bar]` <chr> ~
$ `[Hershey's Dark Chocolate]` <chr> ~
$ `[Hershey’s Milk Chocolate]` <chr> ~
$ `[Hershey's Kisses]` <chr> ~
$ `[Hugs (actual physical hugs)]` <chr> ~
$ `[Jolly Rancher (bad flavor)]` <chr> ~
$ `[Jolly Ranchers (good flavor)]` <chr> ~
$ `[JoyJoy (Mit Iodine!)]` <chr> ~
$ `[Junior Mints]` <chr> ~
$ `[Senior Mints]` <chr> ~
$ `[Kale smoothie]` <chr> ~
$ `[Kinder Happy Hippo]` <chr> ~
$ `[Kit Kat]` <chr> ~
$ `[LaffyTaffy]` <chr> ~
$ `[LemonHeads]` <chr> ~
$ `[Licorice (not black)]` <chr> ~
$ `[Licorice (yes black)]` <chr> ~
$ `[Lindt Truffle]` <chr> ~
$ `[Lollipops]` <chr> ~
$ `[Mars]` <chr> ~
$ `[Mary Janes]` <chr> ~
$ `[Maynards]` <chr> ~
$ `[Mike and Ike]` <chr> ~
$ `[Milk Duds]` <chr> ~
$ `[Milky Way]` <chr> ~
$ `[Regular M&Ms]` <chr> ~
$ `[Peanut M&M’s]` <chr> ~
$ `[Blue M&M's]` <chr> ~
$ `[Red M&M's]` <chr> ~
$ `[Third Party M&M's]` <chr> ~
$ `[Minibags of chips]` <chr> ~
$ `[Mint Kisses]` <chr> ~
$ `[Mint Juleps]` <chr> ~
$ `[Mr. Goodbar]` <chr> ~
$ `[Necco Wafers]` <chr> ~
$ `[Nerds]` <chr> ~
$ `[Nestle Crunch]` <chr> ~
$ `[Now'n'Laters]` <chr> ~
$ `[Peeps]` <chr> ~
$ `[Pencils]` <chr> ~
$ `[Person of Interest Season 3 DVD Box Set (not including Disc 4 with hilarious outtakes)]` <chr> ~
$ `[Pixy Stix]` <chr> ~
$ `[Reese’s Peanut Butter Cups]` <chr> ~
$ `[Reese's Pieces]` <chr> ~
$ `[Reggie Jackson Bar]` <chr> ~
$ `[Rolos]` <chr> ~
$ `[Skittles]` <chr> ~
$ `[Smarties (American)]` <chr> ~
$ `[Smarties (Commonwealth)]` <chr> ~
$ `[Snickers]` <chr> ~
$ `[Sourpatch Kids (i.e. abominations of nature)]` <chr> ~
$ `[Spotted Dick]` <chr> ~
$ `[Starburst]` <chr> ~
$ `[Sweet Tarts]` <chr> ~
$ `[Swedish Fish]` <chr> ~
$ `[Sweetums (a friend to diabetes)]` <chr> ~
$ `[Tic Tacs]` <chr> ~
$ `[Those odd marshmallow circus peanut things]` <chr> ~
$ `[Three Musketeers]` <chr> ~
$ `[Tolberone something or other]` <chr> ~
$ `[Trail Mix]` <chr> ~
$ `[Twix]` <chr> ~
$ `[Vials of pure high fructose corn syrup, for main-lining into your vein]` <chr> ~
$ `[Vicodin]` <chr> ~
$ `[Whatchamacallit Bars]` <chr> ~
$ `[White Bread]` <chr> ~
$ `[Whole Wheat anything]` <chr> ~
$ `[York Peppermint Patties]` <chr> ~
$ `Please list any items not included above that give you JOY.` <chr> ~
$ `Please list any items not included above that give you DESPAIR.` <chr> ~
$ `Please leave any witty, snarky or thoughtful remarks or comments regarding your choices.` <chr> ~
$ `Guess the number of mints in my hand.` <chr> ~
$ `Betty or Veronica?` <chr> ~
$ `"That dress* that went viral a few years back - when I first saw it, it was ________"` <chr> ~
$ `What is your favourite font?` <chr> ~
$ `Please estimate the degree(s) of separation you have from the following celebrities [JK Rowling]` <chr> ~
$ `Please estimate the degree(s) of separation you have from the following celebrities [JJ Abrams]` <chr> ~
$ `Please estimate the degree(s) of separation you have from the following celebrities [Beyoncé]` <chr> ~
$ `Please estimate the degree(s) of separation you have from the following celebrities [Bieber]` <chr> ~
$ `Please estimate the degree(s) of separation you have from the following celebrities [Kevin Bacon]` <chr> ~
$ `Please estimate the degree(s) of separation you have from the following celebrities [Francis Bacon (1561 - 1626)]` <chr> ~
$ `Which day do you prefer, Friday or Sunday?` <chr> ~
$ `Do you eat apples the correct way, East to West (side to side) or do you eat them like a freak of nature, South to North (bottom to top)?` <chr> ~
$ `When you see the above image of the 4 different websites, which one would you most likely check out (please be honest).` <chr> ~
$ `[York Peppermint Patties] Ignore` <lgl> ~
glimpse(candy_2017)
Rows: 2,460
Columns: 120
$ `Internal ID` <dbl> 90258773, 90272821, 902~
$ `Q1: GOING OUT?` <chr> NA, "No", NA, "No", "No~
$ `Q2: GENDER` <chr> NA, "Male", "Male", "Ma~
$ `Q3: AGE` <chr> NA, "44", "49", "40", "~
$ `Q4: COUNTRY` <chr> NA, "USA", "USA", "us",~
$ `Q5: STATE, PROVINCE, COUNTY, ETC` <chr> NA, "NM", "Virginia", "~
$ `Q6 | 100 Grand Bar` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | Anonymous brown globs that come in black and orange wrappers\t(a.k.a. Mary Janes)` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Any full-sized candy bar` <chr> NA, "JOY", NA, "JOY", "~
$ `Q6 | Black Jacks` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | Bonkers (the candy)` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Bonkers (the board game)` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Bottle Caps` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Box'o'Raisins` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Broken glow stick` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Butterfinger` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Cadbury Creme Eggs` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | Candy Corn` <chr> NA, "MEH", NA, "DESPAIR~
$ `Q6 | Candy that is clearly just the stuff given out for free at restaurants` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Caramellos` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | Cash, or other forms of legal tender` <chr> NA, "JOY", NA, "JOY", "~
$ `Q6 | Chardonnay` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | Chick-o-Sticks (we don’t know what that is)` <chr> NA, "DESPAIR", NA, "JOY~
$ `Q6 | Chiclets` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Coffee Crisp` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Creepy Religious comics/Chick Tracts` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Dental paraphenalia` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Dots` <chr> NA, "MEH", NA, "JOY", "~
$ `Q6 | Dove Bars` <chr> NA, "JOY", NA, "JOY", "~
$ `Q6 | Fuzzy Peaches` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Generic Brand Acetaminophen` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Glow sticks` <chr> NA, "DESPAIR", NA, "JOY~
$ `Q6 | Goo Goo Clusters` <chr> NA, "DESPAIR", NA, "JOY~
$ `Q6 | Good N' Plenty` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | Gum from baseball cards` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Gummy Bears straight up` <chr> NA, "MEH", NA, "JOY", "~
$ `Q6 | Hard Candy` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | Healthy Fruit` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Heath Bar` <chr> NA, "MEH", NA, "DESPAIR~
$ `Q6 | Hershey's Dark Chocolate` <chr> NA, "JOY", NA, "MEH", "~
$ `Q6 | Hershey’s Milk Chocolate` <chr> NA, "JOY", NA, "MEH", "~
$ `Q6 | Hershey's Kisses` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | Hugs (actual physical hugs)` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Jolly Rancher (bad flavor)` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Jolly Ranchers (good flavor)` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | JoyJoy (Mit Iodine!)` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Junior Mints` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Senior Mints` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Kale smoothie` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Kinder Happy Hippo` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Kit Kat` <chr> NA, "JOY", NA, "MEH", "~
$ `Q6 | LaffyTaffy` <chr> NA, "DESPAIR", NA, "JOY~
$ `Q6 | LemonHeads` <chr> NA, "MEH", NA, "JOY", "~
$ `Q6 | Licorice (not black)` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | Licorice (yes black)` <chr> NA, "JOY", NA, "JOY", "~
$ `Q6 | Lindt Truffle` <chr> NA, "MEH", NA, "JOY", "~
$ `Q6 | Lollipops` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Mars` <chr> NA, "DESPAIR", NA, "JOY~
$ `Q6 | Maynards` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Mike and Ike` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | Milk Duds` <chr> NA, "MEH", NA, "DESPAIR~
$ `Q6 | Milky Way` <chr> NA, "JOY", NA, "JOY", "~
$ `Q6 | Regular M&Ms` <chr> NA, "JOY", NA, "MEH", "~
$ `Q6 | Peanut M&M’s` <chr> NA, "MEH", NA, "JOY", "~
$ `Q6 | Blue M&M's` <chr> NA, "JOY", NA, "MEH", "~
$ `Q6 | Red M&M's` <chr> NA, "JOY", NA, "MEH", "~
$ `Q6 | Green Party M&M's` <chr> NA, "JOY", NA, "MEH", "~
$ `Q6 | Independent M&M's` <chr> NA, "JOY", NA, "MEH", "~
$ `Q6 | Abstained from M&M'ing.` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Minibags of chips` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Mint Kisses` <chr> NA, "MEH", NA, "DESPAIR~
$ `Q6 | Mint Juleps` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Mr. Goodbar` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Necco Wafers` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Nerds` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Nestle Crunch` <chr> NA, "JOY", NA, "MEH", "~
$ `Q6 | Now'n'Laters` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Peeps` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Pencils` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Pixy Stix` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Real Housewives of Orange County Season 9 Blue-Ray` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Reese’s Peanut Butter Cups` <chr> NA, "JOY", NA, "JOY", "~
$ `Q6 | Reese's Pieces` <chr> NA, "JOY", NA, "MEH", "~
$ `Q6 | Reggie Jackson Bar` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Rolos` <chr> NA, "JOY", NA, "MEH", "~
$ `Q6 | Sandwich-sized bags filled with BooBerry Crunch` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Skittles` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Smarties (American)` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Smarties (Commonwealth)` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Snickers` <chr> NA, "MEH", NA, "JOY", N~
$ `Q6 | Sourpatch Kids (i.e. abominations of nature)` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Spotted Dick` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Starburst` <chr> NA, "MEH", NA, "MEH", "~
$ `Q6 | Sweet Tarts` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Swedish Fish` <chr> NA, "MEH", NA, "JOY", "~
$ `Q6 | Sweetums (a friend to diabetes)` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Take 5` <chr> NA, "DESPAIR", NA, "JOY~
$ `Q6 | Tic Tacs` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Those odd marshmallow circus peanut things` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Three Musketeers` <chr> NA, "JOY", NA, "DESPAIR~
$ `Q6 | Tolberone something or other` <chr> NA, "JOY", NA, "JOY", "~
$ `Q6 | Trail Mix` <chr> NA, "DESPAIR", NA, "MEH~
$ `Q6 | Twix` <chr> NA, "JOY", NA, "JOY", "~
$ `Q6 | Vials of pure high fructose corn syrup, for main-lining into your vein` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Vicodin` <chr> NA, "DESPAIR", NA, "JOY~
$ `Q6 | Whatchamacallit Bars` <chr> NA, "DESPAIR", NA, "JOY~
$ `Q6 | White Bread` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | Whole Wheat anything` <chr> NA, "DESPAIR", NA, "DES~
$ `Q6 | York Peppermint Patties` <chr> NA, "DESPAIR", NA, "DES~
$ `Q7: JOY OTHER` <chr> NA, "Mounds", NA, "Rees~
$ `Q8: DESPAIR OTHER` <chr> NA, NA, NA, NA, NA, NA,~
$ `Q9: OTHER COMMENTS` <chr> NA, "Bottom line is Twi~
$ `Q10: DRESS` <chr> NA, "White and gold", N~
$ ...114 <chr> NA, NA, NA, NA, NA, NA,~
$ `Q11: DAY` <chr> NA, "Sunday", NA, "Sund~
$ `Q12: MEDIA [Daily Dish]` <dbl> NA, NA, NA, NA, NA, NA,~
$ `Q12: MEDIA [Science]` <dbl> NA, 1, NA, 1, 1, 1, NA,~
$ `Q12: MEDIA [ESPN]` <dbl> NA, NA, NA, NA, NA, NA,~
$ `Q12: MEDIA [Yahoo]` <dbl> NA, NA, NA, NA, NA, NA,~
$ `Click Coordinates (x, y)` <chr> NA, "(84, 25)", NA, "(7~
library(janitor)
janitor_candy_2015 <- janitor::clean_names(candy_2015)
janitor_candy_2015
janitor_candy_2016 <- janitor::clean_names(candy_2016)
janitor_candy_2016
janitor_candy_2017 <- janitor::clean_names(candy_2017)
janitor_candy_2017
just looking at who is reporting back about these ones…
janitor_candy_2017 %>%
select(q3_age, q2_gender, q6_independent_m_ms, q6_green_party_m_ms)
# all types of people
janitor_candy_2016 %>%
select(york_peppermint_patties_ignore)
#nobody
REMOVE AND RENAME
Step 1 - Remove and add for each year
Step 2 - RENAME FOR EACH YEAR
2015 REMOVE AND ADD
# names(janitor_candy_2015)
col_removed_candy_2015 <- janitor_candy_2015 %>%
select(-c(116:124), -c(97:113), -c(93:95), -c(90, 91),
-c(peterson_brand_sidewalk_chalk, spotted_dick, mint_leaves,
joy_joy_mit_iodine, minibags_of_chips, lapel_pins, kale_smoothie,
hugs_actual_physical_hugs, heath_bar, healthy_fruit,
creepy_religious_comics_chick_tracts, broken_glow_stick,
glow_sticks, generic_brand_acetaminophen, dental_paraphenalia,
cash_or_other_forms_of_legal_tender,
vials_of_pure_high_fructose_corn_syrup_for_main_lining_into_your_vein,
box_o_raisins, timestamp)) %>%
add_column(year = "2015", .before = 1) %>%
mutate(id_number = row_number(), .before = 2)
col_removed_candy_2015
#view(col_removed_candy_2015)
2015 RENAME
# Looking at differences between responses for Mary Janes thoughts
# col_removed_candy_2015 %>%
# select(anonymous_brown_globs_that_come_in_black_and_orange_wrappers, mary_janes)
candy_2015_renamed <- col_removed_candy_2015 %>%
rename(age = how_old_are_you,
trick_or_treating = are_you_going_actually_going_trick_or_treating_yourself,
anonymous_black_and_orange_wrapper =
anonymous_brown_globs_that_come_in_black_and_orange_wrappers,
brach_not_including_candy_corn = brach_products_not_including_candy_corn,
restaurant_candy =
candy_that_is_clearly_just_the_stuff_given_out_for_free_at_restaurants,
hersheys_dark_chocolate = dark_chocolate_hershey,
gummy_bears = gummy_bears_straight_up,
hersheys_kissables = hershey_s_kissables,
hersheys_milk_chocolate = hershey_s_milk_chocolate,
licorice_black = licorice,
reeses_peanut_butter_cups = reese_s_peanut_butter_cups,
toblerone = tolberone_something_or_other,
peanut_m_ms = peanut_m_m_s,
chick_o_stick = chick_o_sticks_we_don_t_know_what_that_is,
circus_peanuts = those_odd_marshmallow_circus_peanut_things,
sea_salt_chocolate =
sea_salt_flavored_stuff_probably_chocolate_since_this_is_the_it_flavor_of_the_year)
2016 - REMOVE AND ADD
col_removed_candy_2016 <- janitor_candy_2016 %>%
select(-c(104, 105, 107:123),
-c(vicodin, vials_of_pure_high_fructose_corn_syrup_for_main_lining_into_your_vein,
trail_mix, spotted_dick,
person_of_interest_season_3_dvd_box_set_not_including_disc_4_with_hilarious_outtakes,
minibags_of_chips, kale_smoothie, joy_joy_mit_iodine, hugs_actual_physical_hugs,
heath_bar, healthy_fruit, glow_sticks, generic_brand_acetaminophen,
dental_paraphenalia, creepy_religious_comics_chick_tracts, chardonnay,
cash_or_other_forms_of_legal_tender, broken_glow_stick, boxo_raisins,
bonkers_the_board_game, timestamp)) %>%
add_column(year = "2016", .before = 1) %>%
mutate(id_number = max(candy_2015_renamed$id_number) + row_number(), .before = 2)
col_removed_candy_2016
2016 RENAME
# Again checking on Mary Janes column differences
# col_removed_candy_2016 %>%
# select(anonymous_brown_globs_that_come_in_black_and_orange_wrappers, mary_janes)
candy_2016_renamed <- col_removed_candy_2016 %>%
rename(trick_or_treating =
are_you_going_actually_going_trick_or_treating_yourself,
gender = your_gender,
age = how_old_are_you,
country = which_country_do_you_live_in,
state_or_prov = which_state_province_county_do_you_live_in,
anonymous_black_and_orange_wrapper =
anonymous_brown_globs_that_come_in_black_and_orange_wrappers,
bonkers = bonkers_the_candy,
restaurant_candy =
candy_that_is_clearly_just_the_stuff_given_out_for_free_at_restaurants,
chick_o_stick = chick_o_sticks_we_don_t_know_what_that_is,
gummy_bears = gummy_bears_straight_up,
hersheys_milk_chocolate = hershey_s_milk_chocolate,
licorice_black = licorice_yes_black,
peanut_m_ms = peanut_m_m_s,
party_bag_m_ms = third_party_m_ms,
reeses_peanut_butter_cups = reese_s_peanut_butter_cups,
sourpatch_kids = sourpatch_kids_i_e_abominations_of_nature,
sweetarts = sweet_tarts,
sweetums = sweetums_a_friend_to_diabetes,
circus_peanuts = those_odd_marshmallow_circus_peanut_things,
toblerone = tolberone_something_or_other)
candy_2016_renamed
NA
2017 CLEAN AND ADD Note - should have removed the “q6_” before this, but have done so in the next chunk
col_removed_candy_2017 <- janitor_candy_2017 %>%
select(-c(102, 104, 105, 107, 108, 110:120),
-c(q6_spotted_dick,
q6_sandwich_sized_bags_filled_with_boo_berry_crunch,
q6_real_housewives_of_orange_county_season_9_blue_ray,
q6_minibags_of_chips,
q6_abstained_from_m_ming,
q6_kale_smoothie, q6_joy_joy_mit_iodine,
q6_hugs_actual_physical_hugs,
q6_heath_bar,
q6_healthy_fruit,
q6_glow_sticks,
q6_generic_brand_acetaminophen,
q6_dental_paraphenalia,
q6_creepy_religious_comics_chick_tracts,
q6_chardonnay,
q6_cash_or_other_forms_of_legal_tender,
q6_broken_glow_stick,
q6_boxo_raisins,
q6_bonkers_the_board_game,
internal_id)) %>%
add_column(year = "2017", .before = 1) %>%
mutate(id_number = max(candy_2016_renamed$id_number) + row_number(), .before = 2)
col_removed_candy_2017
2017 RENAME - get rid of “q1/2/3/4/5/6” at the start of col names and rename to match 2015 and 16
candy_2017_q_removed <- col_removed_candy_2017 %>%
rename_all(~ sub("^[q0-9]{2}_", "",
make.names(names(col_removed_candy_2017))))
candy_2017_renamed <- candy_2017_q_removed %>%
rename(trick_or_treating = going_out,
state_or_prov = state_province_county_etc,
x100_grand_bar = `100_grand_bar`,
mary_janes =
anonymous_brown_globs_that_come_in_black_and_orange_wrappers_a_k_a_mary_janes,
bonkers = bonkers_the_candy,
restaurant_candy =
candy_that_is_clearly_just_the_stuff_given_out_for_free_at_restaurants,
chick_o_stick = chick_o_sticks_we_don_t_know_what_that_is,
gummy_bears = gummy_bears_straight_up,
hersheys_milk_chocolate = hershey_s_milk_chocolate,
licorice_black = licorice_yes_black,
peanut_m_ms = peanut_m_m_s,
green_m_ms = green_party_m_ms,
lone_m_ms = independent_m_ms,
reeses_peanut_butter_cups = reese_s_peanut_butter_cups,
sourpatch_kids = sourpatch_kids_i_e_abominations_of_nature,
sweetarts = sweet_tarts,
sweetums = sweetums_a_friend_to_diabetes,
circus_peanuts = those_odd_marshmallow_circus_peanut_things,
toblerone = tolberone_something_or_other)
candy_2017_renamed
view(candy_2015_renamed)
view(candy_2016_renamed)
view(candy_2017_renamed)
Getting an idea of people’s responses:
distinct(candy_2015_renamed, age)
#(chr) 146 responses, some silly and some strange
distinct(candy_2015_renamed, trick_or_treating)
#(chr) yes or no (NAs in 2017)
distinct(candy_2015_renamed, starburst)
distinct(candy_2016_renamed, age)
# (chr) 98 incl silly/strange
distinct(candy_2016_renamed, trick_or_treating)
#(chr) Yes No (NAs in 2017)
distinct(candy_2016_renamed, gender)
# Male, Female, Other, I'd rather not say, NA
distinct(candy_2016_renamed, country)
#93 some silly, some e.g. USA, US, us, u.s.a. etc
distinct(candy_2016_renamed, starburst)
distinct(candy_2017_renamed, age)
#(chr) 107 incl silly/strange
distinct(candy_2017_renamed, trick_or_treating)
#(chr) Yes No and NA
distinct(candy_2017_renamed, gender)
#Male, Female, Other, I'd rather not say, NA
distinct(candy_2017_renamed, country)
#118 some silly, some e.g. USA, US, us, u.s.a. etc
distinct(candy_2017_renamed, starburst)
Joining all three years by binding rows so as to keep everything
bound_candy <- bind_rows(candy_2015_renamed,
candy_2016_renamed,
candy_2017_renamed)
bound_candy <- bound_candy %>%
relocate(country, .before = 5) %>%
relocate(state_or_prov, .before = 6) %>%
relocate(gender, .before = 7)
# view(bound_candy)
QUESTION 1 What is the total number of candy ratings given across the three years. (Number of candy ratings, not the number of raters. Don’t count missing values)
ANSWER - 590,010 ratings
total_ratings <- bound_candy %>%
select(-c(year, id_number, age,
trick_or_treating, country,
state_or_prov, gender)) %>%
pivot_longer(butterfinger:take_5,
names_to = "candy",
values_to = "rating") %>%
filter(!is.na(rating)) %>%
count(n())
total_ratings
NA
AGE CLEANING
Age is a character column with 274 distinct values. They are a mix of numbers, NAs, and strange and silly values. * I changed the “age” column to a numeric but it output strange figures, so I specified “as.integer” and the strange answers became NAs. * It seems unlikely that anyone over 100 years old is taking part, so I removed them.
# bound_candy %>%
# distinct(age)
bound_age_to_numeric <- bound_candy %>%
mutate(age = as.integer(age)) %>%
arrange(age)
Warning: Problem while computing `age = as.integer(age)`.
i NAs introduced by coercion
Warning: Problem while computing `age = as.integer(age)`.
i NAs introduced by coercion to integer range
# bound_age_to_numeric %>%
# distinct(age)
bound_age_cleaning <- bound_age_to_numeric %>%
mutate(age = if_else(age > 99, NA_integer_, age))
bound_age_cleaning
# view(bound_age_cleaning)
QUESTION TWO What was the average age of people who are going out trick or treating?
ANSWER - The (rounded) average age of those going trick or treating is 35 years old (34.94897 is the unrounded value)
bound_age_cleaning %>%
select(age, trick_or_treating) %>%
group_by(trick_or_treating) %>%
summarise(average_age = round(mean(age, na.rm = TRUE)))
# to just get the answer for Yes on it's own:
bound_age_cleaning %>%
select(age, trick_or_treating) %>%
group_by(trick_or_treating) %>%
filter(trick_or_treating == "Yes") %>%
summarise(average_age = round(mean(age, na.rm = TRUE)))
NA
QUESTION THREE What was the average age of people who are not going trick or treating?
ANSWER - The (rounded) average age of those not going trick or treating is 39 years old (39.10454 is the unrounded value)
bound_age_cleaning %>%
select(age, trick_or_treating) %>%
group_by(trick_or_treating) %>%
summarise(average_age = round(mean(age, na.rm = TRUE)))
# to just get the answer for No on it's own:
bound_age_cleaning %>%
select(age, trick_or_treating) %>%
group_by(trick_or_treating) %>%
filter(trick_or_treating == "No") %>%
summarise(average_age = round(mean(age, na.rm = TRUE)))
NA
QUESTION FOUR For each of joy, despair and meh, which candy bar revived the most of these ratings?
ANSWER
* Despair: gum that comes with baseball cards returned the most despair
responses with 7,341 * Joy: Full sized candy bars made the most people
joyful with 7,589 responses. This seems very generic so I ran it again
to remove the full sized candy bars and the top Joy response was: 7369
responses for reeses peanut butter cups * Meh: 1,570 “Meh” responses
were given for lollipops
rated_candy <- bound_candy %>%
select(-c(year, id_number, age,
trick_or_treating, country,
state_or_prov, gender, any_full_sized_candy_bar)) %>%
pivot_longer(butterfinger:take_5,
names_to = "candy",
values_to = "rating")
rated_candy
rated_candy %>%
group_by(rating, candy) %>%
summarise(number_of_ratings = n()) %>%
filter(number_of_ratings == max(number_of_ratings))
`summarise()` has grouped output by 'rating'. You can override using the `.groups` argument.
# run again removing generic "full sized candy bars"
rated_candy <- bound_candy %>%
select(-c(year, id_number, age,
trick_or_treating, country,
state_or_prov, gender, any_full_sized_candy_bar)) %>%
pivot_longer(butterfinger:take_5,
names_to = "candy",
values_to = "rating")
rated_candy
rated_candy %>%
group_by(rating, candy) %>%
summarise(number_of_ratings = n()) %>%
filter(number_of_ratings == max(number_of_ratings))
`summarise()` has grouped output by 'rating'. You can override using the `.groups` argument.
QUESTION FIVE How many people rated Starburst as despair?
ANSWER - 1990 people voted despair for starbursts.
rated_candy %>%
filter(candy == "starburst") %>%
group_by(rating) %>%
summarise(rating_count = n())
Error in filter(., candy == "starburst") : object 'rated_candy' not found
COUNTRY CLEANING
Firstly getting an idea of NAs and distinct country values:
library(stringr)
Using stringr and regex to reduce “country” values
This started out a lot longer, and I am sure there are faster/ better ways of doing this, but I was practicing different options (If it were shorter - I am not sure about the difficulties that may arise if other countries were added at a later date?)
For columns that were numbers, inputs like “N. America”, or clearly fake (silly) answers, I checked the state or province column to see if there was a match to a country
I checked my work one row at a time to try and ensure I did not change anything that was not meant to be changed.
Tried to be careful when changing numbers as some = States according to their state/province column and some do not equal anything.
Accidentally changed some words like A(a)ustralia to A(a)States, and Austria to Statesa… which is ok as I am just including them as “other world”
There are issues with “Statesof A”(there are 4), “the best one -States”(1), “TheStates”(2), “The States” (1), and a couple of others but as there are so few so I have moved on
Cascadia includes Canada and States so I left it as is
United Kingdom is inclusive of “Endland”, “England”, “Scotland” and variations of “UK”
Those under “Ireland” are all in the Republic
There are some more notes on specific pieces of code within the code chunk
Other Countries: Germany germany AStates (Australia) aStates
(australia) Statesa (Austria) Japan Mexico Netherlands The Netherlands
netherlands Sweden belgium Ireland New ZeStates (New Zealand)
Switzerland China France france Denmark Korea South Korea Brasil
cascadia Cascadia Costa Rica croatia españa spain South africa
Europe
Finland
finland Greece hong kong Hong Kong
hungary Iceland Indonesia kenya Not theStatesor Canada Panama
Philippines Portugal Singapore sweden Taiwan UAE
Strange / tricky States ones I didn’t get to: Can Statesof A Sub-Canadian North America… ’Merica TheStates 1 A A tropical island south of the equator Atlantis Denial Earth EUA Fear and Loathing I don’t know anymore god’s country insanity lately N. America (the state is in canada) I pretend to be from Canada, but I am really from theStates Narnia Neverland one of the best ones See above Somewhere States(I think but it’s an election year so who can really tell) States? Hard to tell anymore. Statess subscribe to dm4uz3 on youtube the best one -States The republic of Cascadia (this is half States half Canada) The States The Yoo Ess of Aaayyyyyy there isn’t one for old men this one UD United States of America
For the next three questions, count despair as -1, joy as +1, and meh as 0.
candy_pivot_ratings <- bound_country_clean %>%
select(-c(id_number,
age,
trick_or_treating,
state_or_prov)) %>%
pivot_longer(butterfinger:take_5,
names_to = "candy",
values_to = "rating") %>%
mutate(ratings_numeric = case_when(rating == "JOY" ~ 1,
rating == "DESPAIR" ~ -1,
rating == "MEH" ~ 0)
)
candy_pivot_ratings
#view(candy_pivot_ratings)
QUESTION SIX What was the most popular candy bar by this rating system for each gender in the dataset?
ANSWER The most popular candy bar for all four groups was “any full sized candy bar”. But, as decided in a previous question, this seems too generic so I would suggest the following top results: * “Female”: Reese’s Peanut Butter Cups. * “Male”: also Reese’s Peanut Butter Cups. * “Other”: Twix * “I’d rather not say”: Kit-Kat
candy_pivot_ratings %>%
# filter(gender == "Female") %>%
group_by(gender, candy) %>%
summarise(rating_count = mean(ratings_numeric, na.rm = TRUE)) %>%
arrange(desc(rating_count))
`summarise()` has grouped output by 'gender'. You can override using the `.groups` argument.
QUESTION SEVEN What was the most popular candy bar in each year?
ANSWER * Again, the same “full sized candy bar” arguement re: it’s too generic means that the most popular candy bar in each year was: * 2015: Reese’s Peanut Butter Cups. * 2016: Kit-Kat. * 2017: Reese’s Peanut Butter Cups.
candy_pivot_ratings %>%
group_by(year, candy) %>%
summarise(rating_count = mean(ratings_numeric, na.rm = TRUE)) %>%
arrange(desc(rating_count))
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
QUESTION EIGHT What was the most popular candy bar by this rating for people in US, Canada, UK, and all other countries?
ANSWER “Any full sized candy bar” was ignored again due to being too generic, therefore each countries most popular candy was:
United States: Reese’s peanut butter cups Canada: Kit-Kat UK: Rolos The rest of the world: Kit-Kat